Histopathology is the microscopic examination of tissue samples (from biopsies), to study manifestations of disease.
— Wikipedia

The tissue is processed by embedding in paraffin wax, microtomy, and staining with dyes.











We're going to pull primarily from scikit-image, a fantastic Python library for image processing.
We'll also use parts of scipy's ndimage module.
Much like in other areas of traditional machine learning, a lot of image processing is about feature engineering, representing the information you have in a way that is more digestable and processable by a computer. Feature engineering is as much art as science.
# Load the data, and look for a hint of where to start first
widefield_image_file = "data/Sheep11-4x-77.jpg"
image = transform.rescale(io.imread(widefield_image_file), 0.25)
plot_image(image)
The first channel is potentially promising, but those veins might cause issues. Instead of thinking about this in terms of RGB, maybe we can break the image up into its dyes : haematoxylin and eosin.
# Perform a colour deconvolution
heu = color.separate_stains(image, colour_deconv_matrix).astype(float)
plot_image(heu)
The middle channel here looks like a good start. Let's boost the contrast and improve our dynamic range using histogram equalisation.
# Apply CLAHE on the haematoxylin channel
rescaled = exposure.rescale_intensity(heu[:, :, 1], out_range=(0, 1))
equalised_hist = exposure.equalize_adapthist(rescaled)
plot_image(np.dstack((image, equalised_hist)))
Haematoxylin dyes nuclei - and zones of inflammation of just high concentrations of cells, each of which contain a nucleus. We can now pretty easily pick out the zones of inflammation. Still, we need to differentiate between the inflammation and the background, which also contains lots of nuclei.
# Apply Gaussian blur
blurred = filters.gaussian(equalised_hist, 7)
plot_image(np.dstack((image, blurred)))
Applying a blur helped to even things out a little. We're at a point where the inflammation is quite bright, and the background is a medium grey. Let's crank up the contrast to stretch our dynamic range a little.
# Sigmoid transform for contrast
contrast = exposure.adjust_sigmoid(blurred, cutoff=0.6)
plot_image(np.dstack((image, contrast)))
This looks much more promising now. We're probably at a point where we can threshold again.
# Apply an adaptive threshold
thresholded = contrast > filters.threshold_local(contrast, 351, offset=-0.1)
plot_image(np.dstack((image, thresholded)))
We now have something that matches fairly well. However, the contours of the inflammatory zones aren't very "natural", and the zones contain holes. We can clean this up using some morphological operations.
# Use a maximum filter followed by a binary closing to fill any holes
enlarged = maximum_filter(thresholded, 5)
inflammation = morphology.closing(enlarged, morphology.disk(11))
plot_image(np.dstack((image, inflammation)), titles=["Original", "Automated image processing"])
# Load up the human mask and visualise everything together
mask = transform.rescale(io.imread("data/Sheep11-4x-77.jpg_mask.jpg"), 0.25)
all_images = np.dstack((image, mask, inflammation))
plot_image(all_images, titles=["Original", "Human-masked", "Automatic"])


This is a quick experiment with a type of neural network called a convolutional encoder-decoder. They're some of the state-of-the-art neural network architectures for semantic segmentation : breaking down images into different parts, based on their content.
Over the weekend, I trained a ( very untuned ) convolutional encoder-decoder. Let's see what it can do.
# Let's use a convolutional encoder-decoder network to find inflammatory zones
from keras.models import load_model
model = load_model("models/histo_convnet.hdf5")
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_206 (Conv2D) (None, 432, 648, 32) 896 _________________________________________________________________ conv2d_207 (Conv2D) (None, 432, 648, 32) 9248 _________________________________________________________________ conv2d_208 (Conv2D) (None, 432, 648, 32) 9248 _________________________________________________________________ max_pooling2d_37 (MaxPooling (None, 216, 324, 32) 0 _________________________________________________________________ conv2d_209 (Conv2D) (None, 216, 324, 64) 18496 _________________________________________________________________ conv2d_210 (Conv2D) (None, 216, 324, 64) 36928 _________________________________________________________________ conv2d_211 (Conv2D) (None, 216, 324, 64) 36928 _________________________________________________________________ max_pooling2d_38 (MaxPooling (None, 108, 162, 64) 0 _________________________________________________________________ conv2d_212 (Conv2D) (None, 108, 162, 256) 147712 _________________________________________________________________ conv2d_213 (Conv2D) (None, 108, 162, 256) 590080 _________________________________________________________________ max_pooling2d_39 (MaxPooling (None, 54, 81, 256) 0 _________________________________________________________________ conv2d_214 (Conv2D) (None, 54, 81, 512) 1180160 _________________________________________________________________ conv2d_215 (Conv2D) (None, 54, 81, 512) 2359808 _________________________________________________________________ up_sampling2d_37 (UpSampling (None, 108, 162, 512) 0 _________________________________________________________________ conv2d_216 (Conv2D) (None, 108, 162, 256) 1179904 _________________________________________________________________ conv2d_217 (Conv2D) (None, 108, 162, 256) 590080 _________________________________________________________________ up_sampling2d_38 (UpSampling (None, 216, 324, 256) 0 _________________________________________________________________ conv2d_218 (Conv2D) (None, 216, 324, 64) 147520 _________________________________________________________________ conv2d_219 (Conv2D) (None, 216, 324, 64) 36928 _________________________________________________________________ conv2d_220 (Conv2D) (None, 216, 324, 64) 36928 _________________________________________________________________ up_sampling2d_39 (UpSampling (None, 432, 648, 64) 0 _________________________________________________________________ conv2d_221 (Conv2D) (None, 432, 648, 32) 18464 _________________________________________________________________ conv2d_222 (Conv2D) (None, 432, 648, 32) 9248 _________________________________________________________________ conv2d_223 (Conv2D) (None, 432, 648, 32) 9248 _________________________________________________________________ conv2d_224 (Conv2D) (None, 432, 648, 1) 289 ================================================================= Total params: 6,418,113 Trainable params: 6,418,113 Non-trainable params: 0 _________________________________________________________________
# Use the convnet to find the inflammatory zones, and compare with our work so far
nn_image = transform.rescale(image, 0.5)
pred = model.predict(nn_image.reshape(1, *nn_image.shape)).squeeze()
plot_grid((image, mask, inflammation, pred), ["Original", "Human-masked", "Classical", "ConvNet"])
scikit-image is a fantastic library with excellent documentation and a great API.
Keras is my library of choice for quickly developing neural networks, including the one trained above.
scipy's ndimage module contains a number of very useful image processing methods, such as FFT-based convolutions, binary morphological operations, and measurement tools.